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Abstract 

This work presents a distributed method for control centers to monitor the operating condition of a power network, i.e., to 
estimate the network state, and to ultimately determine the occurrence of threatening situations. State estimation has been 
recognized to be a fundamental task for network control centers to ensure correct and safe functionalities of power grids. 
We consider (static) state estimation problems, in which the state vector consists of the voltage magnitude and angle at 
all network buses. We consider the state to be linearly related to network measurements, which include power flows, current 
injections, and voltages phasors at some buses. We admit the presence of several cooperating control centers, and we design two 
distributed methods for them to compute the minimum variance estimate of the state given the network measurements. The 
two distributed methods rely on different modes of cooperation among control centers: in the first method an incremental mode 
of cooperation is used, whereas, in the second method, a diffusive interaction is implemented. Our procedures, which require 
each control center to know only the measurements and structure of a subpart of the whole network, are computationally 
efficient and scalable with respect to the network dimension, provided that the number of control centers also increases with 
the network cardinality. Additionally, a finite-memory approximation of our diffusive algorithm is proposed, and its accuracy 
is characterized. Finally, our estimation methods are exploited to develop a distributed algorithm to detect corrupted data 
among the network measurements. 



1 Introduction 

Large-scale complex systems, such as, for instance, the electrical power grid and the telecommunication system, 
are receiving increasing attention from researchers in different fields. The wide spatial distribution and the high 
dimensionality of these systems preclude the use of centralized solutions to tackle classical estimation, control, 
and fault detection problems, and they require, instead, the development of new decentralized techniques. One 
possibility to overcome these issues is to geographically deploy some monitors in the network, each one responsible 
for a different subpart of the whole system. Local estimation and control schemes can successively be used, together 
with an information exchange mechanism to recover the performance of a centralized scheme. 

1.1 Control centers, state estimation and cyber security in power networks 

Power systems are operated by system operators from the area control center. The main goal of the system operator 
is to maintain the network in a secure operating condition, in which all the loads are supplied power by the generators 
without violating the operational limits on the transmission lines. In order to accomplish this goal, at a given point 
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Fig. 1. In Fig. 1(a), remote terminal units (RTU) transmit their measurements to a SCADA terminal via a LAN network. 
The data is then sent to a Control Center to implement network estimation, control, and optimization procedures. Fig. 1(b) 
shows the diagram of the IEEE 118 bus system (courtesy of the IIT Power Group). The network has 118 buses, 186 branches, 
99 loads, and 54 generators. 



in time, the network model and the phasor voltages at every system bus need to be determined, and preventive 
actions have to be taken if the system is found in an insecure state. For the determination of the operating state, 
remote terminal units and measuring devices are deployed in the network to gather measurements. These devices 
are then connected via a local area network to a SCADA (Supervisory Control and Data Acquisition) terminal, 
which supports the communication of the collected measurements to a control center. At the control center, the 
measurement data is used for control and optimization functions, such as contingency analysis, automatic generation 
control, load forecasting, optimal power flow computation, and reactive power dispatch [1]. A diagram representing 
the interconnections between remote terminal units and the control center is reported in Fig. 1(a). Various sources 
of uncertainties, e.g., measurement and communication noise, lead to inaccuracies in the received data, which may 
affect the performance of the control and optimization algorithms, and, ultimately, the stability of the power plant. 
This concern was first recognized and addressed in [27,28,29] by introducing the idea of (static) state estimation in 
power systems. 



Power network state estimators are broadly used to obtain an optimal estimate from redundant noisy measurements, 
and to estimate the state of a network branch which, for economical or computational reasons, is not directly 
monitored. For the power system state estimation problem, several centralized and parallel solutions have been 
developed in the last decades, e.g., see [19,8,30]. Being an online function, computational issues, storage requirements, 
and numerical robustness of the solution algorithm need to be taken into account. Within this regard, distributed 
algorithms based on network partitioning techniques are to be preferred over centralized ones. Moreover, even in 
decentralized setting, the work in [20] on the blackout of August 2003 suggests that an estimation of the entire 
network is essential to prevent networks damages. In other words, the whole state vector should be estimated by 
and available to every unit. The references [34,12] explore the idea of using a global control center to coordinate 
estimates obtained locally by several local control centers. In this work, we improve upon these prior results by 
proposing a fully decentralized and distributed estimation algorithm, which, by only assuming local knowledge of 
the network structure by the local control centers, allows them to obtain in finite time an optimal estimate of the 
network state. Being the computation distributed among the control centers, our procedure appears scalable against 
the power network dimension, and, furthermore, numerically reliable and accurate. 



A second focus of this paper is false data detection and cyber attacks in power systems. Because of the increasing 
reliance of modern power systems on communication networks, the possibility of cyber attacks is a real threat 
[18]. One possibility for the attacker is to corrupt the data coming from the measuring units and directed to the 
control center, in order to introduce arbitrary errors in the estimated state, and, consequently, to compromise the 
performance of control and optimization algorithms [14]. This important type of attack is often referred in the 
power systems literature to as false data injection attack. Recently, the authors of [33] show that a false data 
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injection attack, in addition to destabilizing the grid, may also lead to fluctuations in the electricity market, causing 
significant economical losses. The presence of false data is classically checked by analyzing the statistical properties 
of the estimation residual z — Hx, where z is the measurements vector, a; is a state estimate, and H is the state to 
measurements matrix. For an attack to be successful, the residual needs to remain within a certain confidence level. 
Accordingly, one approach to circumvent false data injection attacks is to increase the number of measurements so 
as to obtain a more accurate confidence bound. Clearly, by increasing the number of measurements, the data to be 
transmitted to the control center increases, and the dimension of the estimation problem grows. By means of our 
estimation method, we address this dimensionality problem by distributing the false data detection problem among 
several control centers. 

1.2 Related work on distributed estimation and projection methods 

Starting from the eighties, the problem of distributed estimation has attracted intense attention from the scientific 
community, generating along the years a very rich literature. More recently, because of the advent of highly integrated 
and low-cost wireless devices as key components of large autonomous networks, the interest for this classical topic has 
been renewed. For a wireless sensor network, novel applications requiring efficient distributed estimation procedures 
include, for instance, environment monitoring, surveillance, localization, and target tracking. Considerable effort has 
been devoted to the development of distributed and adaptive filtering schemes, which generalize the notion of adaptive 
estimation to a setup involving networked sensing and processing devices [4]. In this context, relevant methods include 
incremental Least Mean-Square [15], incremental Recursive Least-Square [24], Diffusive Least Mean-Square [24], and 
Diffusive Recursive Least-Square [4]. Diffusion Kalman filtering and smoothing algorithms are proposed, for instance, 
in [3,5], and consensus based techniques in [25,26]. We remark that the strategies proposed in the aforementioned 
references could be adapted for the solution of the power network static estimation problem. Their assumptions, 
however, appear to be not well suited in our context for the following reasons. First, the convergence of the above 
estimation algorithms is only asymptotic, and it depends upon the communication topology. As a matter of fact, for 
many communication topologies, such as Cayley graphs and random geometric graphs, the convergence rate is very 
slow and scales badly with the network dimension. Such slow convergence rate is clearly undesirable because a delayed 
state estimation could lead the power plant to instability. Second, approaches based on Kalman filtering require the 
knowledge of the global state and observation model by all the components of the network, and they violate therefore 
our assumptions. Third and finally, the application of these methods to the detection of cyber attacks, which is also 
our goal, is not straightforward, especially when detection guarantees arc required. An exception is constituted by 
[31], where a estimation technique based on local Kalman filters and a consensus strategy is developed. This latter 
method, however, besides exhibiting asymptotic convergence, does not offer guarantees on the final estimation error. 

Our estimation technique belongs to the family of Kaczmarz (row-projection) methods for the solution of a linear 
system of equations. See [13,11,32,6] for a detailed discussion. Differently from the existing row-action methods, our 
algorithms exhibit finite time convergence towards the exact solution, and they can be used to compute any weighted 
least squares solution to a system of linear equations. 

1.3 Our contributions 

The contributions of this work are threefold. First, we adopt the static state network estimation model, in which the 
state vector is linearly related to the network measurements. We develop two methods for a group of interconnected 
control centers to compute an optimal estimate of the system state via distributed computation. Our first estimation 
algorithm assumes an incremental mode of cooperation among the control centers, while our second estimation 
algorithm is based upon a diffusive strategy. Both methods are shown to converge in a finite number of iterations, 
and to require only local information for their implementation. Differently than [23], our estimation procedures 
assume neither the measurement error covariance nor the measurements matrix to be diagonal. Furthermore, our 
algorithms are advantageous from a communication perspective, since they reduce the distance between remote 
terminal units and the associated control center, and from a computational perspective, since they distribute the 
measurements to be processed among the control centers. Second, as a minor contribution, we describe a finite-time 
algorithm to detect via distributed computation if the measurements have been corrupted by a malignant agent. Our 
detection method is based upon our state estimation technique, and it inherits its convergence properties. Notice that, 
since we assume the measurements to be corrupted by noise, the possibility exists for an attacker to compromise the 
network measurements while remaining undetected (by injecting for instance a vector with the same noise statistics). 
With respect to this limitation, we characterize the class of corrupted vectors that are guaranteed to be detected 
by our procedure, and we show optimality with respect to a centralized detection algorithm. Third, we study the 
scalability of our methods in networks of increasing dimension, and we derive a finite-memory approximation of our 
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diffusive estimation strategy. For this approximation procedure we show that, under a reasonable set of assumptions 
and independent of the network dimension, each control center is able to recover a good approximation of the state 
of a certain subnetwork through little computation. Moreover, we provide bounds on the approximation error for 
each subnetwork. Finally, we illustrate the effectiveness of our procedures on the IEEE 118 bus system. 

The rest of the paper is organized as follows. In Section 2 we introduce the problem under consideration, and we 
describe the mathematical setup. Section 3 contains our main results on the state estimation and on the detection 
problem, as well as our algorithms. Section 4 describes our approximated state estimation algorithm. In Section 5 we 
study the IEEE 118 bus system, and we present some simulation results. Finally, Section 6 contains our conclusion. 



2 Problem setup and preliminary notions 

For a power network, an example of which is reported in Fig. 1(b), the state at a certain instant of time consists 
of the voltage angles and magnitudes at all the system buses. The (static) state estimation problem introduced 
in the seminal work by Schweppe [27] refers to the procedure of estimating the state of a power network given a 
set of measurements of the network variables, such as, for instance, voltages, currents, and power flows along the 
transmission lines. To be more precise, let x £ R" and zeP be, respectively, the state and measurements vector. 
Then, the vectors x and z are related by the relation 

z = h(x) + 7], (1) 

where h(-) is a nonlinear measurement function, and where ?y, which is traditionally assumed to be a zero mean 
random vector satisfying E[?7?7 T ] = E J; = E^ > 0, is the noise measurement. An optimal estimate of the network state 
coincides with the most likely vector x that solves equation (1). It should be observed that, instead of by solving the 
above estimation problem, the network state could be obtained by measuring directly the voltage phasors by means 
of phasor measurement devices |j] Such an approach, however, would be economically expensive, since it requires to 
deploy a phasor measurement device at each network bus, and it would be very vulnerable to communication failures 
[1] . In this work, we adopt the approximated estimation model presented in [28] , which follows from the linearization 
around the origin of equation (1). Specifically, 

z = Hx + v, (2) 

where H £ M. pxn and where v, the noise measurement, is such that K[v] — andi?[iw T ] = E = E T > 0. Observe 
that, because of the interconnection structure of a power network, the measurement matrix H is usually sparse. Let 
Ker(i?) denote the null space defined by the matrix H. For the equation (2), without affecting generality, assume 
Ker(iJ) = {0}, and recall from [17] that the vector 

x wls = (F T E- 1 J H')- 1 F T S- 1 2 (3) 

minimizes the weighted variance of the estimation error, i.e., x w \ s — argmin^(z — iJi) T E~ 1 (z — Hx). 

The centralized computation of the minimum variance estimate to (2) assumes the complete knowledge of the matrices 
H and E, and it requires the inversion of the matrix 77 T S _1 _ff . For a large power network, such computation imposes 
a limitation on the dimension of the matrix H, and hence on the number of measurements that can be efficiently 
processed to obtain a real-time state estimate. Since the performance of network control and optimization algorithms 
depend upon the precision of the state estimate, a limitation on the network measurements constitutes a bottleneck 
toward the development of a more efficient power grid. A possible solution to address this complexity problem is 
to distribute the computation of x w \ s among geographically deployed control centers (monitors), in a way that each 
monitor is responsible for a subpart of the whole network. To be more precise, let the matrices H and E, and the 



1 Phasor measurement units are devices that synchronize by using GPS signals, and that allow for a direct measurement of 
voltage and current phasors. 
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where, for i G {1, .. . , m}, m, e N, fli G R m > xn , E, e M m ' xp , e M ro S and ^™ i "*i = P- Let G = (V,£) be a 
connected graph in which each vertex i e V = {1, . . . , m} denotes a monitor, and £ £ V x V denotes the set of 
monitors interconnections. For i £ {1, ... ,m}, assume that monitor i knows the matrices Hi, E*, and the vector zj. 
Moreover, assume that two neighboring monitors are allowed to cooperate by exchanging information. Notice that, if 
the full matrices H and E are nowhere available, and if they cannot be used for the computation of x w i s , then, with 
no cooperation among the monitors, the vector x w \ s cannot be computed by any of the monitor. Hence we consider 
the following problem. 

Problem 1 (Distributed state estimation) Design an algorithm for the monitors to compute the minimum vari- 
ance estimate of the network state via distributed computation. 

We now introduce the second problem addressed in this work. Given the distributed nature of a power system and 
the increasing reliance on local area networks to transmit data to a control center, there exists the possibility for an 
attacker to compromise the network functionality by corrupting the measurements vector. When a malignant agent 
corrupts some of the measurements, the state to measurements relation becomes 

z = Hx + v + w, 

where the vector w € IR P is chosen by the attacker, and, therefore, it is unknown and unmeasurable by any of the 
monitoring stations. We refer to the vector w to as false data. From the above equation, it should be observed that 
there exist vectors w that cannot be detected through the measurements z. For instance, if the false data vector is 
intentionally chosen such that w G Im(iJ), then the attack cannot be detected through the measurements z. Indeed, 
denoting with t the pseudoinverse operation, the vector x + H^w is a valid network state. In this work, we assume 
that the vector w is detectable from the measurements z, and we consider the following problem. 

Problem 2 (Distributed detection) Design an algorithm for the monitors to detect the presence of false data in 
the measurements via distributed computation. 

As it will be clear in the sequel, the complexity of our methods depends upon the dimension of the state, as well 
as the number of monitors. In particular, few monitors should be used in the absence of severe computation and 
communication contraints, while many monitors arc preferred otherwise. We believe that a suitable choice of the 
number of monitors depends upon the specific scenario, and it is not further discussed in this work. 

Remark 1 (Generality of our methods) In this paper we focus on the state estimation and the false data detec- 
tion problem for power systems, because this field of research is currently receiving sensible attention from different 
communities. The methods described in the following sections, however, are general, and they have applicability 
beyond the power network scenario. For instance, our procedures can be used for state estimation and false data 
detection in dynamical system, as described in [21] for the case of sensors networks. 

3 Optimal state estimation and false data detection via distributed computation 

The objective of this section is the design of distributed methods to compute an optimal state estimate from 
measurements. With respect to a centralized method, in which a powerful central processor is in charge of processing 



2 In most application the error covariance matrix is assumed to be diagonal, so that each submatrix Ej is very sparse. 
However, we do not impose any particular structure on the error covariance matrix. 
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Algorithm 1 Incremental minimum norm solution (i-th monitor) 
Input: Hi, z,; 

Require: [zj . . . 4J T G Im([Hj . . . Hl] T ): 
if i = 1 then 

x := 0, if := 
else 

receive x^i and from monitor i — 1; 

end if 

:= + ii'i_i(iJ iJ ft: i _i) t (^ i - HiXi-i); 
Ki := Basis^-iKer^^.!)); 

if i < m then 

transmit and to monitor i + 1; 
else 

return x m ; 
end if 



all the data, our procedures require the computing units to have access to only a subset of the measurements, and 
are shown to reduce significantly the computational burden. In addition to being convenient for the implementation, 
our methods are also optimal, in the sense that they maintain the same estimation accuracy of a centralized method. 

For a distributed method to be implemented, the interaction structure among the computing units needs to be 
defined. Here we consider two modes of cooperations among the computing units, and, namely, the incremental and 
the diffusive interaction. In an incremental mode of cooperation, information flows in a sequential manner from one 
node to the adjacent one. This setting, which usually requires the least amount of communications [22], induces a 
cyclic interaction graph among the processors. In a diffusive strategy, instead, each node exchanges information with 
all (or a subset of) its neighbors as defined by an interaction graph. In this case, the amount of communication and 
computation is higher than in the incremental case, but each node possesses a good estimate before the termination 
of the algorithm, since it improves its estimate at each communication round. This section is divided into three parts. 
In Section 3.2, we first develop a distributed incremental method to compute the minimum norm solution to a set 
of linear equations, and then exploit such method to solve a minimum variance estimation problem. In Section 3.3 
we derive a diffusive strategy which is amenable to asynchronous implementation. Finally, in Section 3.4 we propose 
a distributed algorithm for the detection of false data among the measurements. Our detection procedure requires 
the computation of the minimum variance state estimate, for which either the incremental or the diffusive strategy 
can be used. 

3.1 Incremental solution to a set of linear equations 

We start by introducing a distributed incremental procedure to compute the minimum norm solution to a set of 
linear equations. This procedure constitutes the key ingredient of the incremental method we later propose to solve 
the minimum variance estimation problem. Let H G W xm , and let z G Im(_ff), where Im(_ff) denotes the range space 
spanned by the matrix H . Consider the system of linear equations z — Hx, and recall that the unique minimum 
norm solution to z = Hx coincides with the vector x such that z — Hx and ||x||2 is minimum. It can be shown 
that ||a;||2 being minimum corresponds to x being orthogonal to the null space Ker(H) of H [17]. Let H and z be 
partitioned in m blocks as in (4), and let G = (V, £) be a directed graph such that V = {1, . . . , to} corresponds to the 
set of monitors, and, denoting with (i, j) the directed edge from j to i, £ = {(i + 1, i) : i = 1, . . . , m — 1} U {(1, to)}. 
Our incremental procedure to compute the minimum norm solution to z = Hx is in Algorithm 1, where, given a 
subspacc V, we write Basis(V) to denote any full rank matrix whose columns span the subspace V. We now proceed 
with the analysis of the convergence properties of the Incremental minimum norm solution algorithm. 

Theorem 3.1 (Convergence of Algorithm 1) Let z = Hx, where H and z are partitioned in to row-blocks as in (4). 
In Algorithm 1, the m-th monitor returns the vector x such that z = Hx and x _L Ker(iJ). 

PROOF. See Section 6.1. 



It should be observed that the dimension of Ki decreases, in general, when the index i increases. In particular, 
K m = {0} and K x = Ke^i/i). To reduce the computational burden of the algorithm, monitor i could transmit the 
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smallest among Basis(iQ_i Kev(HiKi-i)) and Basis(i^i_i Ker(i/j^_i)- L ), together with a packet containing the 
type of the transmitted basis. 

Remark 2 (Computational complexity of Algorithm 1) In Algorithm 1, the main operation to be performed 
by the i-th agent is a singular value decomposition ( S VP ) PH Indeed, since the range space and the null space of a 
matrix can be obtained through its SVD, both the matrices {HlKi-{y and Basis(ifi_i Ker(.ffj.K,-_i)) can be recovered 
from the SVD of HiKi-i. Let H € W mxn , to > n, and assume the presence of \m/k~\ monitors, 1 < k < m. Recall 
that, for a matrix M € M fexp ; the singular value decomposition can be performed with complexity 0(mm{kp 2 , k 2 p}) 
[10]. Hence, the computational complexity of computing a minimum norm solution to the system z = Hx is 0{mn 2 ). 
In Table 1 we report the computational complexity of Algorithm 1 as a function of the size k. 

Table 1 

Computational complexity of Algorithm 1. 



Block size 


J-th complexity 


Total complexity 


Communications 


k < n 


0{k 2 n) 


0(mkn) 


\m/k] - 1 


k > n 


0(kn 2 ) 


0(mn 2 ) 


\m/k] - 1 



The following observations are in order. First, if k < n, then the computational complexity sustained by the i-th 
monitor is much smaller than the complexity of a centralized implementation, i.e., 0(k 2 n) <C 0(mn 2 ). Second, the 
complexity of the entire algorithm is optimal, since, in the worst case, it maintains the computational complexity of 
a centralized solution, i.e., O(mkn) < 0(mn 2 ). Third and finally, a compromise exists between the blocks size k and 
the number of communications needed to terminate Algorithm 1. In particular, if k = m, then no communication is 
needed, while, if k = 1, then to — 1 communication rounds are necessary to terminate the estimation algorithm^] 

3.2 Incremental state estimation via distributed computation 

We now focus on the computation of the weighted least squares solution to a set of linear equations. Let v be an 
unknown and unmeasurable random vector, with E(v) = and E(w T ) = £ = E T > 0. Consider the system of 
equations 



z = Hx + v, 



(5) 



and assume Ker(_ff) = 0. Notice that, because of the noise vector v, we generally have z ^ lm(H), so that Algorithm 
1 cannot be directly employed to compute the vector x w \ s defined in (3). It is possible, however, to recast the above 
weighted least squares estimation problem to be solvable with Algorithm 1. Note that, because the matrix S is 
symmetric and positive definite, there exists]^] a full row rank matrix B such that £ = BB T . Then, equation (5) can 
be rewritten as 



H sB 



(G) 



where e € M>o, E[V] = and E[uu T ] = e 2 I. Observe that, because B has full row rank, the system (6) is 
underdetermined, i.e., z G Im([H eB]) and Kei([H eB}) ^ 0. Let 



x(s) 
v 



H eB 



t 



(7) 



The following theorem characterizes the relation between the minimum variance estimation x w \ s and x(e). 



3 The matrix H is usually very sparse, since it reflects the network interconnection structure. Efficient SVD algorithms for 
very large sparse matrices are being developed (cf. SVDPACK). 

4 Additional m — 1 communication rounds are needed to transmit the estimation to every other monitor. 

5 Choose for instance B = WA 1 ^ 2 , where IT" is a basis of eigenvectors of E and A is the corresponding diagonal matrix of the 
eigenvalues. 
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Theorem 3.2 (Convergence with e) Consider the system of linear equations z = Hx + v. Let E(v) = and 
E(vv T ) = S = BB T > for a full row rank matrix B. Let 

C = e(I — HH^)B, E^L-C^C, D = eE[L + e 2 EB 1 ~(HH J ) ] BE^B 1 \HH T )\l - eBCft). 



Then 



H eB 



fft _ eiJt S (ct + D) 
Ct + D 



and 



lim H^ - sH^B(C^ + D) = (H T T,- 1 H)- 1 H T Y 1 - 1 . 

£^0 + 



PROOF. Sec Section 6.2. 



Throughout the paper, let x(e) be the vector defined in (7), and notice that Theorem 3.2 implies that 

2! w i s = lim x(e). 

Remark 3 (Incremental state estimation) For the system of equations z = Hx + v, let BB T be the covariance 
matrix of the noise vector v, and let 









B 1 




Zl 












Z2 


H = 




, B = 




, z = 






_H m _ 




_B m _ 




_ ~m _ 



where m, e N, Hi e R m ' xn , Bi e R miXp , and Zi <G R mi . For e > 0, the estimate x(e) of the weighted least squares 
solution to z — Hx + v can be computed by means of Algorithm 1 with input [Hi eBi] and z,- L . 



Observe now that the estimate x(e) coincides with cc w i s only in the limit for e — > + . When the parameter e is fixed, 
the estimate x(e) differs from the minimum variance estimate x w \ s . We next characterize the approximation error 

*£\vls x{e) • 

Corollary 3.1 (Approximation error) Consider the system z = Hx + v, and let E[vv T ] = BB T for a full row rank 
matrix B. Then 

x wis — x(s) — eH^BDz, 

where D is as in Theorem 3.2. 



PROOF. With the same notation as in the proof of Theorem 3.2, for every value of e > 0, the difference x w i s — x(e) 
equals ((H T Y,- 1 H)- 1 H T Y,- 1 - H f + eH^B{C^ + D)) z. Since (H T Y<~ 1 H)~ 1 H T 'E~ 1 - + eH^BC^ = for every 
e > 0, it follows x w i s — x(e) = eH^BDz. 



Therefore, for the solution of system (5) by means of Algorithm 1, the parameter e is chosen according to Corollary 
3.1 to meet a desired estimation accuracy. It should be observed that, even if the entire matrix H needs to be known 
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for the computation of the exact parameter e, the advantages of our estimation technique are preserved. Indeed, if 
the matrix H is unknown and an upper bound for \\H^ BDz\\ is known, then a value for e can still be computed that 
guarantees the desired estimation accuracy. On the other hand, even if H is entirely known, it may be inefficient 
to use H to perform a centralized state estimation over time. Instead, the parameter e needs to be computed only 
once. To conclude this section, we characterize the estimation residual z — Hx. This quantity plays an important 
role for the synthesis of a distributed false data detection algorithm. 

Corollary 3.2 (Estimation residual) Consider the system z = Hx + v, and let E[w T ] = £ = E T > 0. TTterp] 

lim \\z-Hx(e)\\ < \\(I - HW)\\\\v\\, 

£->0 + 

where W = {H T T,- 1 H)- 1 H T T,- 1 . 



PROOF. By virtue of Theorem 3.2 we have lim e ^ 0+ x(e) = x w i s = (H r '£- 1 H)- 1 H r 'E- 1 z = Wz. Observe that 
HWH — if, and recall that z = Hx + v. For any matrix norm, we have 

\\z-Hx wls \\ = \\z-HWz\\ = - HW)(Hx + v)\\ = \\Hx — Hx + (I — HW)v\\ < - HW)\\\\v\\, 

and the theorem follows. 



3.3 Diffusive state estimation via distributed computation 

The implementation of the incremental state estimation algorithm described in Section 3.2 requires a certain degree 
of coordination among the control centers. For instance, an ordering of the monitors is necessary, such that the i-th 
monitor transmits its estimate to the (i + l)-th monitor. This requirement imposes a constraint on the monitors 
interconnection structure, which may be undesirable, and, potentially, less robust to link failures. In this section, we 
overcome this limitation by presenting a diffusive implementation of Algorithm 1, which only requires the monitors 
interconnection structure to be connectedp"| To be more precise, let V — {1, . . . , m} be the set of monitors, and 
let G = (V, E) be the undirected graph describing the monitors interconnection structure, where E C V x V, and 
(i,j) S E if and only if the monitors i and j are connected. The neighbor set of node i is defined as Ni = {j G V : 
(i, j) £ E}. We assume that G is connected, and we let the distance between two vertices be the minimum number of 
edges in a path connecting them. Finally, the diameter of a graph G, in short diam(G), equals the greatest distance 
between any pair of vertices. Our diffusive procedure is described in Algorithm 2, where the matrices Hi and eBi 
are as defined in equation (8). During the h-ih iteration of the algorithm, monitor i, with i G {1, . . . , TV}, performs 
the following three actions in order: 

(i) transmits its current estimates Xi and Ki to all its neighbors; 

(ii) receives the estimates Xj from neighbors Ni) and 

(iii) updates Xi and Ki as in the for loop of Algorithm 2. 

We next show the convergence of Algorithm 2 to the minimum variance estimate. 

Theorem 3.3 (Convergence of Algorithm 2) Consider the system of linear equations z = Hx + v, where E[v] = 
and K[vv T ] = BB T . Let H, B and z be partitioned as in (8), and let e > 0. Let the monitors communication graph 
be connected, let d be its diameter, and let the monitors execute the Diffusive state estimation algorithm. Then, each 
monitor computes the estimate x(e) of x in d steps. 



PROOF. Let £i be the estimate of the monitor i, and let Ki be such that x — Xi € lm(Ki), where x denotes the 
network state, and Xi _L Im(Ki). Notice that Zi = [Hi sBi]xi, where Zi it the i-th measurements vector. Let i and j 



6 Given a vector v and a matrix H, we denote by ||w|| any vector norm, and by \\H\\ the corresponding induced matrix norm. 

7 An undirected graph is said to be connected if there exists a path between any two vertices [9] . 
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Algorithm 2 Diffusive state estimation (i-th monitor) 



Input: Hi, eB iy zf, 
it := [Hi eB^Zi] 
Ki := Basis(Ker([i? l eB,})); 
while K t ^ do 
for j e Ni do 

receive Xj and iij; 



X 7 



[Ki 0][-Ki Ktfixi-Xj), 



Ki := Basis(Im(.K i ) n lm( K j)h 
end for 

transmit Xi and Kf, 
end while 



be two neighboring monitors. Notice that there exist vectors Vi and Vj such that Xi + KiVi = xj+KjVj. In particular, 
those vectors can be chosen as 



= [-K i K j ]\x i -x j ). 



It follows that the vector 



X- Xi 



+ [K, 0][-Ki Ktftfi-Xj) 



is such that Zi — [Hi eBi]x)~ and Zj = [Hj eBAxf. Moreover we have xf _L (Im(i'Q) n lm(Kj)). Indeed, notice that 



3 J 



J_ Kev([-Ki K 3 \) D 




: KiWi = KjWj 



We now show that K^i _L lm(Kj). By contradiction, if KiVi / lm(Kj), then i>j = Vi + Uj, with KiVi _L Im(.Kj) 
and i^iSi G Im(i^j). Let Uj = KjKiVi, and f)j = Uj — Uj. Then, [uj wJ] T e Kcr([— ^]), and hence [vj vJ] T / 
Ker([— i^j which contradicts the hypothesis. We conclude that [i^ 0][— Ki Kjy(xi — xf) _L lm{Kf), and, 

since iEj _L lm(Ki), it follows _L (Im(Ki) n Im(.K'j)). The theorem follows from the fact that after a number of 
steps equal to the diameter of the monitors communication graph, each vector Xi verifies all the measurements, and 
x i ±Im(K 1 )r\---r\Im(K m ). 



As a consequence of Theorem 3.2, in the limit for e to zero, Algorithm 2 returns the minimum variance estimate 
of the state vector, being therefore the diffusive counterpart of Algorithm 1. A detailed comparison between incre- 
mental and diffusive methods is beyond the purpose of this work, and we refer the interested reader to [15,16] and 
the references therein for a thorough discussion. Here we only underline some key differences. While Algorithm 1 
requires less operations, being therefore computationally more efficient, Algorithm 2 does not constraint the monitors 
communication graph. Additionally, Algorithm 2 can be implemented adopting general asynchronous communica- 
tion protocols. For instance, consider the Asynchronous (diffusive) state estimation algorithm, where, at any given 
instant of time in N, at most one monitor, say j, sends its current estimates to its neighbors, and where, for i e Nj, 
monitor i performs the following operations: 

(i) Xi := Xi + [Ki 0][-Ki K^{xi - %), 

(ii) K t := Basis(Im(i^) n Im^)). 

Corollary 3.3 (Asynchronous estimation) Consider the system of linear equations z = Hx + v, where E[v] = 
and E[vv T ] = BB T . Let H , B and z be partitioned as in (8), and let e > 0. Let the monitors communication graph be 
connected, let d be its diameter, and let the monitors execute the Asynchronous (diffusive) state estimation algorithm. 
Assume that there exists a duration TeN such that, within each time interval of duration T, each monitor transmits 
its current estimates to its neighbors. Then, each monitor computes the estimate x{e) of x within time dT . 
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Algorithm 3 False data detection (i-th monitor) 

Input: H- L , sB i} T; 
while True do 

collect measurements Zi{t); 

estimate network state x(t) via Algorithm 1 or 2 
if \\zi{t) - Hi&tyWoo >rthen 
return False data detected; 
end if 
end while 



PROOF. The proof follows from the following two facts. First, the intersection of subspaces is a commutative 
operation. Second, since each monitor performs a data transmission within any time interval of length T, it follows 
that, at time dT, the information related to one monitor has propagated through the network to every other monitor. 



3.4 Detection of false data via distributed computation 

In the previous sections we have shown how to compute an optimal state estimate via distributed computation. A 
rather straightforward application of the proposed state estimation technique is the detection of false data among 
the measurements. When the measurements are corrupted, the state to measurements relation becomes 

z = Hx + v + w, 

where w is the false data vector. As a consequence of Corollary 3.2, the vector w is detectable if it affects significantly 
the estimation residual, i.e., if lim e ^ \\ z — Hx{e)\\ > T, where the threshold T depends upon the magnitude of the 
noise v. Notice that, because false data can be injected at any time by a malignant agent, the detection algorithm 
needs to be executed over time by the control centers. Let z(t) be the measurements vector at a given time instant 
t, and let E[z(ii)z T (i2)] = for all t\ 7^ t%. Based on this considerations, our distributed detection procedure is in 
Algorithm 3, where the matrices Hi and eBi are as defined in equation (8), and T is a predefined threshold. 

In Algorithm 3, the value of the threshold T determines the false alarm and the misdetection rate. Clearly, if 
T > ||(/ — -ffW)||||t>(£)|| and e is sufficiently small, then no false alarm is triggered, at the expenses of the misdetection 
rate. By decreasing the value of V the sensitivity to failures increases together with the false alarm rate. Notice that, if 
the magnitude of the noise signals is bounded by 7, then a reasonable choice of the threshold is V = ^{1 — HW^^o, 
where the use of the infinity norm in Algorithm 3 is also convenient for the implementation. Indeed, since the 
condition \\z(t) — Hx^t)^ > T is equivalent to \\zi(t) ~ HiX (t)||oo > T for some monitor i, the presence of false data 
can be independently checked by each monitor without further computation. Notice that an eventual alarm message 
needs to be propagated to all other control centers. 

Remark 4 (Statistical detection) A different strategy for the detection of false data relies on statistical tech- 
niques, e.g., see [1]. In the interest of brevity, we do not consider these methods, and we only remark that, once the 
estimation residual has been computed by each monitor, the implementation of a (distributed) statistical procedure, 
such as, for instance, the (distributed) y^-Test, is a straightforward task. 



4 A finite-memory estimation technique 

The procedure described in Algorithm 1 allows each agent to compute an optimal estimate of the whole network 
state in finite time. In this section, we allow each agent to handle only local, i.e., of small dimension, vectors, and 
we develop a procedure to recover an estimate of only a certain subnetwork. We envision that the knowledge of only 
a subnetwork may be sufficient to implement distributed estimation and control strategies. 

We start by introducing the necessary notation. Let the measurements matrix H be partitioned into m 2 , being m 
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the number of monitors in the network, blocks as 



H = 



Hn ■ ■ ■ Hi m 

Hml ' ' ' H nl m 



(9) 



where Hu £ 



for all i,j € {1, 



i}. The above partitioning reflects a division of the whole network into 



competence regions: we let each monitor be responsible for the correct functionality of the subnetwork defined by its 
blocks. Additionally, we assume that the union of the different regions covers the whole network, and that different 
competence regions may overlap. Observe that, in most of the practical situations, the matrix H has a sparse 
structure, so that many blocks Hy have only zero entries. We associate an undirected graph Gh with the matrix H, 
in a way that Gh reflects the interconnection structure of the blocks Hij. To be more precise, we let Gh = (Vh,£h), 
where Vh — {1, . . . , m} denotes the set of monitors, and where, denoting by the undirected edge from j to i, it 
holds € Eh if and only if ^ or \\Hji\\ ^ 0. Noticed that the structure of the graph Gh, which reflects 

the sparsity structure of the measurement matrix, describes also the monitors interconnections. By using the same 
partitioning as in (9), the Moore-Penrose pseudoinverse of H can be written as 



= H 



Hml ' ' ' H rf 



(10) 



where H^ £ 



ixrni _ Assume that H has full row rankp"| and observe that W = H J (HH T ) 1 . Consider the 
equation z = Hx, and let H^ z = x = [xj . . . x m ], where, for all i € {1, . . . ,m}, Xi € M n< . We employ Algorithm 2 
for the computation of the vector x, and we let 



be the estimate vector of the i-th monitor after h iterations of Algorithm 2, i.e., after h executions of the while loop 



in Algorithm 2. In what follows, we will show that, for a sufficiently sparse matrix H, the error 



has an 



exponential decay when h increases, so that it becomes negligible before the termination of Algorithm 2, i.e., when 
h < diam(G/ l ). The main result of this section is next stated. 

Theorem 4.1 (Local estimation) Let the full-row rank matrix H be partitioned as in (9). Let [a, b], with a < b, 
be the smallest interval containing the spectrum of HH T . Then, for i £ {1, . . . , m} and h G N, there exists C £ R>o 
and q £ (0, 1) such that 



< Cq 



Before proving the above result, for the readers convenience, we recall the following definitions and results. Given 
an invertible matrix M of dimension n, let us define the support sets 

h 

S h (M)=\J{(i,j):M k (i,j)^0}, 

being M k (i,j) the (i,j)-th entry of M k , and the decay sets 

D h (M) = ({l,...,n}x{l,...,n})\S h (M). 



The case of a full-column rank matrix is treated analogously. 
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Theorem 4.2 (Decay rate [7]) Let M be of full row rank, and let [a,b], with a < b, be the smallest interval 
containing the spectrum of M. There exist C <G lR>o and q € (0, 1) such that 

sup{|Aft(i, j)\ : e D h (MM T )} < Cq h+1 . 



For a graph Gh and two nodes i and j, let dist(z, j) denote the smallest number of edges in a path from j to i in Gh- The 
following result will be used to prove Theorem 4.1. Recall that, for a matrix M, we have ||M|| max = max{|M(i,j)|}. 

Lemma 4.1 (Decay sets and local neighborhood) Let the matrix H be partitioned as in (9), and let Gh be the 

graph associated with H. For i,j e {1, . . . , m}, if dist(i,j) = h, then 

Kl| m ax<C^ +1 . 



PROOF. The proof can be done by simple inspection, and it is omitted here. 



Lemma 4.1 establishes a relationship between the decay sets of an invertible matrix and the distance among the 
vertices of a graph associated with the same matrix. By using this result, we are now ready to prove Theorem 4.1. 



PROOF. [Proof of Theorem 4.1] Notice that, after h iterations of Algorithm 2, the i-th monitor has received data 
from the monitors within distance h from i, i.e., from the monitors T such that, for each j G T, there exists a path 
of length up to h from j to i in the graph associated with H . Reorder the rows of H such that the i-th block come 
first and the T-th blocks second. Let H = [Hj Hj Hj] T be the resulting matrix. Accordingly, let z = [zj zj zJ] T , 



and let x 



„TlT 



where z = Hx. 



Because H has full row rank, we have 



Pll Pll Pl3 
P21 P22 P23 
P3I P32 P33 












' Pu 


P12 


Pl3 







h 




P21 


P22 


P23 







h 




P31 


P32 


P33 _ 



where I\, I2, and ^3 are identity matrices of appropriate dimension. For a matrix M, let col(M) denote the number 
of columns of M. Let T x = {1, . . . , col(P n )}, T 2 = {1 + col(P n ), . . . , col([P n P 12 ])}, and 

T 3 = {1 + col([Pu P12]), • • • ,col([P n P 12 P 13 ])}. 

Let Ti, T 2 , and T 3 , be, respectively, the indices of the columns of Pn, P12, and P13. Notice that, by construction, if 
i € T\ and j e T3, then dist(i, j) > h. Then, by virtue of Lemma 4.1 and Theorem 4.2, the magnitude of each entry 
of P13 is bounded by Cq^ J +1 , for C,qeR. 

Because H has full row rank, from Theorem 3.1 we have that 

x = H ] z = x + K^HsK^izs - H z x x ), (11) 

where 

x = [Hj H]] T ^[zJ zlY and K l = Basis (Ker([Hj Hj] T )). 

With the same partitioning as before, let x = [xj x\ xJ] T . In order to prove the theorem, we need to show that there 
exists C G M >0 and q e (0, 1) such that 

\\xi-xi\\ < CgLtJ+i. 
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(a) 



(I)) 



Fig. 2. In Fig. 2(a), the normalized euclidean norm of the error vector #bus(e) — #bus,wis is plotted as a function of the parameter 
e, where #bus(e) is the estimation vector computed according to Theorem 3.2, and #bus,wis is the minimum variance estimate 
of ^bus- As e decreases, the vector #bus(e) converges to the minimum variance estimate #bus,wis- In Fig. 2(b), the IEEE 118 bus 
system has been divided into 5 areas. Each area is monitored and operated by a control center. The control centers cooperate 
to estimate the state and to assess the functionality of the whole network. 

Notice that, for (11) to hold, the matrix K\ can be any basis of Ker([Hj Hj] T ). Hence, let K\ = \P~[ 3 Pj 3 Pj 3 }. 
Because every entry of Pi 3 decays exponentially, the theorem follows. 

In Section 5.2 we provide an example to clarify the exponential decay described in Theorem 4.1. 
5 An illustrative example 

The effectiveness of the methods developed in the previous sections is now shown through some examples. 
5.1 Estimation and detection for the IEEE 118 bus system 

The IEEE 118 bus system represents a portion of the American Electric Power System as of December, 1962. This 
test case system, whose diagram is reported in Fig. 1(b), is composed of 118 buses, 186 branches, 54 generators, and 
99 loads. The voltage angles and the power injections at the network buses are assumed to be related through the 
linear relation 

Pbus = Pbus#b US ; 

where the matrix Pbus depends upon the network interconnection structure and the network admittance matrix. 
For the network in Fig. 1(b), let z = Pbus — v be the measurements vector, where E[v] — and E[wu T ] = a 2 1, <r6l. 
Then, following the notation in Theorem 3.2, the minimum variance estimate of 6*bus can be recovered as 

lim [Pbus eal^z. 

In Fig. 2(a) we show that, as e decreases, the estimation vector computed according to Theorem 3.2 converges to 
the minimum variance estimate of #bus- 

In order to demonstrate the advantage of our decentralized estimation algorithm, we assume the presence of 5 
control centers in the network of Fig. 1(b), each one responsible for a subpart of the entire network. The situation is 
depicted in Fig. 2(b). Assume that each control center measures the real power injected at the buses in its area, and 
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Fig. 3. For a fixed value of e, Fig. 3(a) shows the average (over fOO tests) of the norm of the error (with respect to the network 
state) of the estimate obtained by means of Algorithm 1. The estimation error decreases with the number of measurements. 
Because of the presence of several control centers, our distributed algorithm processes more measurements (up to 5JV) while 
maintaining the same (or smaller) computational complexity of a centralized estimation (with N measurements). Fig. 3(b) 
shows the residual functions computed by the 5 control centers. Since the first residual is greater than the threshold value, the 
presence of false data is correctly detected by the first control center. A form of regional identification is possible by simple 
identifying the residuals above the security threshold. 

let Zi = -Pbus.i — Vi, with E[uj] = and E[u,"yT] = <rfl, be the measurements vector of the i-th area. Finally, assume 
that the i-th control center knows the matrix H^ ua i such that Zj — -ffbus,i$bus + v i- Then, as discussed in Section 3, 
the control centers can compute an optimal estimate of 6*bus by means of Algorithm 1 or 2. Let rii be the number of 
measurements of the i-th. area, and let N = X)i=i n i- Notice that, with respect to a centralized computation of the 
minimum variance estimate of the state vector, our estimation procedure obtains the same estimation accuracy while 
requiring a smaller computation burden and memory requirement. Indeed, the i-th monitor uses ni measurements 
instead of TV. Let N be the maximum number of measurements that, due to hardware or numerical contraints, 
a control center can efficiently handle for the state estimation problem. In Fig. 3(a), we increase the number of 
measurements taken by a control center, so that rii < N, and we show how the accuracy of the state estimate 
increases with respect to a single control center with N measurements. 



To conclude this section, we consider a security application, in which the control centers aim at detecting the presence 
of false data among the network measurements via distributed computation. For this example, we assume that each 
control center mesures the real power injection as well the current magnitude at some of the buses of its area. By 
doing so, a sufficient redundancy in the measurements is obtained for the detection to be feasible [1]. Suppose that 
the measurements of the power injection at the first bus of the first area is corrupted by a malignant agent. To 
be more precise, let the measurements vector of the first area be Zj = Zj + eiWi, where e\ is the first canonical 
vector, and Wi is a random variable. For the simulation we choose Wi to be uniformly distributed in the interval 
[0, w m ax]) where w max corresponds approximately to the 10% of the nominal real injection value. In order to detect 
the presence of false data among the measurements, the control centers implement Algorithm 3, where, being H 
the measurements matrix, and cr, E the noise standard deviation and covariance matrix, the threshold value T is 
chosen as 2<t||I— H(H T Y,~ 1 H)~ 1 H T Y,~ 1 \\ oa [®~\The residual functions ||zj —HxWoo are reported in Fig. 3(b). Observe 
that, since the first residual is greater than the threshold T, the control centers successfully detect the false data. 
Regarding the identification of the corrupted measurements, we remark that a regional identification may be possible 
by simply analyzing the residual functions. In this example, for instance, since the residuals {2, . . . , 5} are below the 
threshold value, the corrupted data is likely to be among the measurements of the first area. This important aspect 
is left as the subject of future research. 



5.2 Scalability property of our finite-memory estimation technique 

Consider an electrical network with (ab) 2 buses, where a, 6 G N. Let the buses interconnection structure be a two 
dimensional lattice, and let G be the graph whose vertices are the (ab) 2 buses, and whose edges are the network 
branches. Let G be partitioned into b 2 identical blocks containing a 2 vertices each, and assume the presence of b 2 
control centers, each one responsible for a different network part. We assume the control centers to be interconnected 
through an undirected graph. In particular, being Vi the set of buses assigned to the control center Q , we let the 



9 For a Gaussian distribution with mean /i and variance a 2 , about 95% of the realizations are contained in [fj, — 2a, u + 2a]. 
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Fig. 4. In Fig. 4(a), a two dimensional power grid with 400 buses. The network is operated by 16 control centers, each one 
responsible for a different subnetwork. Control centers cooperate through the red communication graph. Fig. 4(b) shows the 
norm of the estimation error of the local subnetwork as a function of the number of iterations of Algorithm 2. The considered 
monitors are C\ ,Ce, Cii, and Cig. As predicted by Theorem 4.1, the local estimation error becomes negligible before the 
termination of the algorithm. 

control centers C j and Cj be connected if there exists a network branch linking a bus in V* to a bus in Vj . An example 
with b = 4 and a = 5 is in Fig. 4(a). In order to show the effectiveness of our approximation procedure, suppose that 
each control center C, aims at estimating the vector of the voltage angles at the buses in its region. We assume also 
that the control centers cooperate, and that each of them receives the measurements of the real power injected at 
only the buses in its region. Algorithm 2 is implemented by the control centers to solve the estimation problem. In 
Fig. 4(b) we report the estimation error during the iterations of the algorithm. Notice that, as predicted by Theorem 
4.1, each leader possess a good estimate of the state of its region before the termination of the algorithm. 



6 Conclusion 



Two distributed algorithms for network control centers to compute the minimum variance estimate of the network 
state given noisy measurements have been proposed. The two methods differ in the mode of cooperation of the 
control centers: the first method implements an incremental mode of cooperation, while the second uses a diffusive 
interaction. Both methods converge in finite time, which we characterize, and they require only local measurements 
and model knowledge to be implemented. Additionally, an asynchronous and scalable implementation of our diffusive 
estimation method has been described, and its efficiency has been shown through a rigorous analysis and through 
a practical example. Based on these estimation methods, an algorithm to detect cyber-attacks against the network 
measurements has also been developed, and its detection performance has been characterized. 

APPENDIX 

6. 1 Proof of Theorem 3. 1 



PROOF. Let H l = [Hj ■ ■ ■ Hj] T , z i = [zj ■■■ zJ] T . We show by induction that z i = H%, K, = Basis(Ker(£P)), 
and Xi _L Ker(iP). Note that the statements are trivially verified for i = 1. Suppose that they are verified up to i, 
then we need to show that K i+ i = Basis(Ker(i/ 2+1 )), x i+ i _L Ker(H l+1 ), and z l+1 — H l+1 Xi + i. 

We start by proving that JQ+i = Basis(Ker(i7 I+1 )). Observe that Ker(Ki) = for all i, and that 

Ker(H i+1 Ki) = {v : K,v G Ker(iJ 4+1 )}. (A-l) 

Hence, 

Im(K l+1 ) = Im(Ki Ker(H i+1 Ki)) - Im^) n Ker(H i+1 ) = Ker(H l ) n Ker(H l+1 ) - Ker(iT +1 ). 
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We now show that x i+ i _L Ker(fP +1 ), which is equivalent to 

Xi+i = (xi + K i (H i+1 Krf(z i+1 - H i+1 Xi)) £ Kex(H i+1 ) ± . 

Note that 

Ker(iT +1 ) C Ker(iT) <^ Ker(H l+1 ) ± D Ke^/P)- 1 . 

By the induction hypothesis we have Xi £ Kci(H l ) ± , and hence Xi £ Kei(H t+1 ) ± . Therefore, we need to show that 

K i (H i+1 K i )\z i+1 - H i+1 Xi) £ Kcr(^+ 1 ) ± . 

Let w = (Hi+iKiY (z i+ i — H i+ iXi), and notice that w £ Ker(H i+ iK i ) ± due to the properties of the pseudoinverse 
operation. Suppose that KiW £ Kcr(i? i+1 )- L . Since Ker(/Q) = {0}, the vector w can be written as w = W1+W2, where 
KiW\ £ Ker(iJ i+ i)- L and KiWi — KiW — KiW\ 7^ 0, KiW 2 £ Ker(i?j + i). Then, it holds H i+ \KiWi = 0, and hence 
u> 2 £ Ker(H i+ iKi), which contradicts the hypothesis w £ Ker(7J i+ ii i 4r i )- L . Finally K t w £ Ker(_ff J+ i)- L C Ker(iJ I+1 )- L . 

We now show that z l+1 — H l+1 x i+ i. Because of the consistency of the system of linear equations, and because z % = 
H % Xi by the induction hypothesis, there exists a vector Vi £ Ker(iP) = lm(Ki) such that z l+1 — H l+1 (xi + Vi), and 
hence that z%+i = H i+ i(xi We conclude that {z i+1 — H i+ iXi) £ Im(H i+ iKi), and finally that z l+1 = H l+1 x i+1 . 

6.2 Proof of Theorem 3.2 

Before proceeding with the proof of the above theorem, we recall the following fact in linear algebra. 
Lemma 6.1 Let H £ R nxm . Then Kcr((#t) T ) = Kei(H). 

PROOF. We first show that Ker((iJt) T ) c Ker(#). Recall from [2] that H = HH T (H^) T . Let x be such that 
(W) T x = 0, then Hx = HH T (W) T x = 0, so that Ker((fft)T) g Kcr(H). We now show that Ker(H") C Kei((W) T ). 
Recall that (i?t) T = (i/ T )t = (HH J )^H. Let x be such that Hx = 0, then (H^) T x = (HH J ^Hx = 0, so that 
Ker(H) C Ker((.fft) T ), which concludes the proof. 

We are now ready to prove Theorem 3.2. 

PROOF. The first property follows directly from [2] (cfr. page 427). To show the second property, observe that 
C* = \{{I-HH^)B)\ so that 

lim eD = 0. 

For the theorem to hold, we need to verify that 

F f - H*B((I - HH ] )B)^ = {H T Y.- 1 H)- 1 H T Y.- 1 , 

or, equivalently, that 

(if* - H^B((I - HH^B^) HH^ = (H j ^- 1 H)- 1 H t ^- 1 HH\ (A-2) 

and 

(fft - H^B({I - HH*)Bjt) (I - HH^) = (£r T E- 1 ff)- 1 tf T £- 1 (I - HH*). (A-3) 
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Consider equation (A-2). After simple manipulation, we have 

ff f - H*B((I - HH*)B)*HH* = H\ 

so that we need to show only that 

H^B((I - HH^B^HH^ = 0. 

Recall that for a matrix W it holds Wt = (W T Wy\V T . Then the term ((7 - HH^B^HH^ equals 

(((/ - HH^B) T ((I - HH^B)) 1 B T (I - HH^HH* = 0, 

because (I — HH^)HH^ = 0. We conclude that equation (A-2) holds. Consider now equation (A-3). Observe that 
HW(I — HW) = 0. Because B has full row rank, and X = BB T , simple manipulation yields 

- H J (BB T )- 1 HH^B [(I - HH^)B]\l - HH^B = H T (BB T )- 1 (I - HH^)B, 

and hence 

H T (BB T )- 1 i^I + HH^B [(I - HH*)B]*} (I - HH^)B = 0. 

Since HW = I - (I - HW), we obtain 

H T (BB J )- 1 B [(I - HH^B] 1 (I - HH^)B = 0. 
A sufficient condition for the above equation to be true is 

( [(/ - HHl)B] f ) T B T (BB T y 1 H = 0. 



From Lemma 6.1 we have. 



Since 



Ker AA^)B] i y^j = Ker((7 — AA^)B). 



(I - HH^)BB T (BB T )- 1 H = (I - HH^)H = 0, 



we have that 



H T (BB T )- 1 B [(I - HH^B^ (I - HH^)B = 0, 
and that equation (A-3) holds. This concludes the proof. 
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